Towards Semantic Dataset Profiling
نویسندگان
چکیده
The web of data is growing constantly, both in terms of size and impact. A potential data publisher needs to dispose with recapitulative information on the datasets available on the web, so that she can easily identify where to look for the resources to which her data relates. This information will help discover candidate datasets for interlinking. In that context, we investigate the problem of dataset profiling. We define a dataset profile as a set of characteristics, both semantic and statistical, that allow to describe in the best possible way a dataset by taking into account the multiplicity of domains and vocabularies on the web of data.
منابع مشابه
Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks
Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Persian-English Cross-Linguistic Dataset for Research on the Visual Processing of Cognates and Noncognates
Finding out which lexico-semantic features of cognates are critical in cross-language studies and comparing these features with noncognates helps researchers to decide which features to control in studies with cognates. Normative databases provide necessary information for this purpose. Such resources are lacking in the Persian language. We created a dataset and determined norms for the essenti...
متن کاملHow Would You Say It? Eliciting Lexically Diverse Data for Supervised Semantic Parsing
Building dialogue interfaces for realworld scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? Wang et al. (2015) proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having ...
متن کاملTowards a Linked Open Dataset for Scholarly Publishing: Semantic Lancet Project
There is an ever increasing interest in publishing Linked Open Datasets about scientific papers. The current landscape is very fragmented: some projects focus on bibliographic data, others on authorship data, others on citations, and so on. The quality is also heterogeneous and the production and maintenance of such datasets is difficult and time-consuming. In this paper we introduce the Semant...
متن کامل